Comments for MEDB 5501, Week 13

What this talk will cover

  • Calculation of the covariance and correlation.
  • Interpretation of the correlation
  • Missing values
  • SPSS calculations of correlations
  • Spearman correlation
  • Large correlation matrices
  • Confidence intervals and hypothesis tests
  • Partial correlations

Covariance

  • \(Cov(X,Y)=\frac{1}{n-1}\Sigma(X_i-\bar{X})(Y_i-\bar{Y})\)
    • \((X_i-\bar{X})(Y_i-\bar{Y})\) is positive if
      • \(X_i\) and \(Y_i\) both above average
      • \(X_i\) and \(Y_i\) both below average
    • \((X_i-\bar{X})(Y_i-\bar{Y})\) is negative if
      • \(X_i\) above average and \(Y_i\) below average
      • \(X_i\) below average and \(Y_i\) above average

  x  y
 11 13
 15  9
 19 15
 21  7
 25 11
 29  5

\(\ \)

\(\bar{X}=20\);

\(\bar{Y}=10\);

\(S_X=6.5\);

\(S_Y=3.7\)

Calculation of covariance

 x_centered y_centered product
          9         -5     -45
          1         -3      -3
         -5         -1       5
          5          1       5
         -9          3     -27
         -1          5      -5

$$

  • \(Cov(X,Y)=\frac{1}{5}(-70)=-14\)

Correlation

  • \(Corr(X,Y)=\frac{Cov(X,Y)}{S_XS_Y}\)
    • Also use \(r_{XY}\)
    • Population correlation is \(\rho_{XY}\)
  • Other names
    • Pearson correlation
    • Product moment correlation

Calculation of correlation

  • \(r_{XY}=\frac{-14}{6.5 \times 3.7}=-0.571929\)
    • Always round!
      • \(r_{XY}=-0.57\) or \(-0.6\)

Break #1

  • What have you learned
    • Calculation of the covariance and correlation.
  • What is coming next
    • Interpretation of the correlation

Interpretation of correlation

  • r is always between -1 and +1
    • Positive values imply positive association
    • Negative values imply negative association
    • Strongest associations closest to -1 or +1

r between -1 and -0.7, strong negative association

r between -0.7 and -0.3, weak negative association

r between -0.3 and +0.3, little or no association

r between +0.3 and +0.7, weak positive association

r between +0.7 and +1, strong positive association

Extreme case, perfect association

Break #2

  • What have you learned
    • Interpretation of the correlation
  • What is coming next
    • Missing values

Sleep data dictionary, 1 of 6

---
data_dictionary:
  sleep.txt
  
source:
  This dataset is part of the Austrasian Data and
  Story Library (OZDASL). Please cite this data as
  Smyth, GK (2011). Australasian Data and Story 
  Library (OzDASL). http://www.statsci.org/data.
  The data comes originally from Allison, T., and
  Cicchetti, D. V. (1976). Sleep in mammals. 
  ecological and constitutional correlates. 
  Science 194 (November 12), 732-734.

Sleep data dictionary, 2 of 6

description:
  This dataset has information about sleep patterns
  in 62 common mammals, along with other information
  that might help you understand what influences
  variations in sleep.
  
download:
  text-format: http://www.statsci.org/data/general/sleep.txt
  additional-information: http://www.statsci.org/data/general/sleep.html

copyright:
  There is no information about the copyright for this
  dataset. You should, however, be able to use this
  data for individual educational purposes under the
  Fair Use guidelines of U.S. copyright law.

Sleep data dictionary, 3 of 6

format: 
  delimiter: tab
  varnames: included in the first row of data
  missing-value-code: NA
  rows: 62
  columns: 11

Sleep data dictionary, 4 of 6

vars:
  Species:
    label: Species of mammal
    
  BodyWt:
    label: Body weight
    unit: kg
    
  BrainWt:
    label: Brain weight
    unit: g
    

Sleep data dictionary, 5 of 6

  NonDreaming:
    label: Time spent in non-dreaming sleep
    unit: hours
    
  Dreaming:
    label: Time spent in dreaming sleep
    unit: hours
    
  TotalSleep:
    label: Total time spent in sleep
    unit: hours
    
  LifeSpan:
    unit: years
    

Sleep data dictionary, 6 of 6

  Gestation:
    unit: days

  Predation:
    scale: likert
    range: 1-5

  Exposure:
    scale: likert
    range: 1-5
    
  Danger:
    scale: likert
    range: 1-5
---

What does a missing value represent

  • Dropout
  • Refuse to answer survey question
  • Survey question is not applicable
  • Lab result is lost
  • Concentration below detectable limit
  • Many other reasons

Common missing value codes

  • A single dot (.)
    • SPSS and SAS
  • NA
    • R
  • Asterisk (*) and other symbols
  • Unusual number codes (-1, 9, 99, 999)

Importing missing values

  • No problems for default value
  • NA and * convert numeric to string
    • Fix during import, or
    • Convert back after import
  • Unusual number codes
    • Designate after import
    • Don’t forget!

Imputing missing values, 1 of 2

  • Several simple (simplistic?) imputation choices
    • No news is bad news
    • No news is good news
    • No news is average news (MCAR)
    • No news is last week’s news (LOCF)

Imputing missing values, 2 of 2

  • Rigorous approaches (beyond the scope of this class)
    • Missing at random (MAR), Missing not at random (MNAR)
    • Ignorable, Non-ignorable
    • Single/Multiple imputation
    • Maximum likelihood/Bayesian approaches
  • You cannot ignore missingness, you cannot avoid imputation

SPSS investigation of missing data, 1 of 2

SPSS investiation of missing data, 2 of 2

Missing value approaches for correlations, 1 of 2

\[\begin{matrix} A_1 & B_1 & C_1\\ A_2 & B_2 & C_2\\ A_3 & B_3 & C_3\\ A_4 & B_4 & C_4\\ A_5 & B_5 & C_5\\ A_6 & B_6 & .\\ \end{matrix}\]

Missing value approaches for correlations, 2 of 2

  • Listwise deletion (complete case analysis),
    • Use 5 pairs for \(r_{AB}\), \(r_{AC}\), and \(r_{BC}\)
  • Pairwise deletion
    • Use 5 pairs for \(r_{AC}\), and \(r_{BC}\)
    • Use all 6 pairs for \(r_{AB}\)
  • My recommendation
    • Pairwise deletion for descriptive statistics
    • Multiple imputation for inferential statistics
    • Never use complete case analysis

Break #3

  • What have you learned
    • Missing values
  • What is coming next
    • SPSS calculations of correlations

SPSS correlations with pairwise deletion

SPSS analysis with listwise deletion

SPSS analysis, scatterplot matrix

SPSS analysis, 6 of 9

SPSS analysis with a log transformation

Break #4

  • What have you learned
    • SPSS calculations of correlations
  • What is coming next
    • Spearman correlation

Spearman correlation

  x  y rank_x rank_y
 11 13      1      5
 15  9      2      3
 19 15      3      6
 21  7      4      2
 25 11      5      4
 29  5      6      1

When to use the Spearman correlation

  • Similar to considerations for other nonparametric tests
    • Non-normal data
    • Small sample size
    • Ordinal data
  • Measures degree of monotonicity

SPSS Spearman correlations, 1 of 2

SPSS Spearman correlations, 2 of 2

Break #5

  • What have you learned
    • Spearman correlation
  • What is coming next
    • Large correlation matrices

SPSS large correlation matrix

Large correlation matrix after reduction, rounding

Large correlation matrix after further reduction

Break #6

  • What have you learned
    • Large correlation matrices
  • What is coming next
    • Confidence intervals and hypothesis tests

Confidence intervals and hypothesis tests

  • \(r_{XY}\) is a statistic, \(\rho_{XY}\) is a parameter.
    • \(H_0:\ \rho_XY=0\)
    • Accept \(H_0\) if \(r_{XY}\) is close to zero, or
    • Accept \(H_0\) if confidence interval includes zero.

SPSS correlation confidence intervals

## Break #7

  • What have you learned
    • Confidence intervals and hypothesis tests
  • What is coming next
    • Partial correlations

Partial correlation

  • \(\rho_{XY\cdot Z}=\frac{\rho_{XY}-\rho_{XZ}\rho_{ZY}} {\sqrt{1-\rho_{XZ}^2}\sqrt{1-\rho_{ZY}^2}}\)

SPSS partial correlation

Summary

  • Calculation of the covariance and correlation.
  • Interpretation of the correlation
  • Missing values
  • SPSS calculations of correlations
  • Spearman correlation
  • Large correlation matrices
  • Confidence intervals and hypothesis tests
  • Partial correlations